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The  Weighted  Uni-Dimensional  Similarities 
Problem  with  Least  Absolute  Value  Metric  Is 

!  VP-Hard 

\ 

by 

Nathan  P.  Ritchey  and  Gerald  L.  Thompson 

\  ABSTRACT 

\ 

The  purpose  of  this  paper  is  to  prove  that  the  weighted  uni -dimensional 
similarities  problem  with  least  absolute  value  metric  (USPAM)  is,  in  general, 
VP-Hard.  In  the  first  four  sections  of  the  paper,  the  USPAM  problem  and  four 
lemmas  are  presented  which  will  be  used  in  Section  6  to  prove  the  main  theorem 
of  this  paper.  It  is  shown  that  the  simple  max  cut  problem  can,  in  a  polynomial 
number  of  steps,  be  converted  into  a  special  case  of  the  USPAM  problem,  which 
shows  that  the  USPAM  problem  is  VP-Hard.  Finally,  some  special  cases  of  the 
USPAM  problem  are  described  for  which  polynomial  solutions  exist.  ^ _ 
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The  Weighted  Uni-Dimensional  Similarities 
Problem  with  Least  Absolute  Value  Metric  Is 
VP-Hard 

by 

Nathan  P.  Ritchey  and  Gerald  L.  Thompson 


1.  Problem  Description 

The  USPAM  problem  can  be  defined  as  follows:  Given  m  attributes  and 
m(m-l)/2  measurements  of  the  distances  d^  between  attributes  i  and  j  for 

i,j=l,...,m  and  i  <  j,  find  model  locations  z  ,  i=l . m  on  the  real  line  so 

that  W  is  minimized,  where 

ra-l  m  i  i 


W  =  I 


1  =  1 


z 

j=l+l 


z  -z 
J 


(1) 


and  w^  >0  is  the  weight  attached  to  the  deviation  between  the  measured 

distance  d  and  the  model  distance  |z  -z  I. 

ij  1  j  l1 

Geometrically,  a  positioning  of  the  m  points  on  the  number  line  is  sought 
such  that  the  sum  of  the  absolute  values  of  the  differences  between  the  observed 
distance  d^  and  the  model  distance  Jz^-zJ  between  pairs  i  and  j  is 
minimized.  Problems  such  as  this,  but  using  a  least  squares  metric,  arise  often 
in  economics  and  psychometrics,  see  Poole,  1984.  As  far  as  we  know,  this  is  the 
first  paper  to  consider  an  absolute  value  metric  for  the  problem. 


2.  A  Preliminary  Result 

LEMMA  1.  If  the  observed  distances,  d^,  i,j=l . m,  j  >  i,  are  integral, 

then  the  optimal  model  location  of  each  z^,  i=l . m  is  at  an  integer  location 

on  the  real  line. 

PROOF.  Consider  an  arbitrary  ordering  of  the  z^s,  renumbered,  if 
necessary,  so  that  z  <  z  <  .  .  .  <  z  .  The  locally  optimal  solution  for  this 

12  m 

ordering  can  be  found  by  solving  the  following  linear  program. 
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m-1  m 

Min  W  =  Z  Z  (we*  +  wnen)  (P) 

1=1  j=i+i  j  J  J  J 

subject  to 

z  -  z  +  e*  -  e  =  d 
J  i  U  ij  iJ 

zreij,eij  -  0  for  a11  i,  j=l»  •  •  •  ,m,  i  <  j. 

Notice  that  the  coefficient  matrix  formed  by  this  constraint  set  is  the 
transpose  of  a  node-arc  incidence  matrix  for  a  complete  graph  together  with  an 
identity  matrix  and  a  negative  identity  matrix,  which  is  well  known  to  be 

totally  unimodular.  Hence,  the  locally  optimal  placement  of  each  point,  z  , 
for  given  by  the  solution  to  the  linear  program  (P)  for  this  ordering, 

and  in  fact,  for  every  feasible  ordering,  is  integral.  Since  every  local 
solution  occurs  at  an  integer  point,  the  global  solution  must  occur  at  an 

integer  point,  which  completes  the  proof. 

Remark  1.  Since  there  are  m!/2  different  orderings,  the  global  solution 
to  (1)  can  be  found  by  solving  that  many  linear  programs  (P).  However,  this  is 
a  very  inefficient  solution  procedure. 

Remark  2.  It  is  always  possible  to  shift  any  local  solution  up  or  down  the 
number  line  and  not  change  the  value  of  the  objective  function.  This  fact  is 
obviously  true  since  the  objective  function  involves  only  distances  between 
pairs  of  points. 

Remark  3.  From  Lemma  1  and  Remark  2  it  follows  that,  after  shifting,  any 
feasible  ordering  has  clusters  of  model  locations  at  r+1  integer  points 

0,1,...,  r  on  the  real  line,  with  kQ  positions  located  at  0,  k  located  at 

1,  etc. ,  and  k  +k  +. . . +  k  =  m. 

0  1  r 

3.  Binary  Data 

Suppose  that  all  of  the  observed  weights  and  distances  are  required  to  be 
binary,  that  is,  w^  =  0  or  1  and  d  =  0  or  1  for  i,j=l,...,m  and  j  >  i. 
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The  uni-dimensional  similarities  problem  with  absolute  value  metric  and  binary 
data  is  called  the  binary  USPAM  problem. 

LEMMA  2.  For  any  binary  USPAM  problem,  there  exists  at  least  one  optimal 
solution  such  that  z.  =  0  or  1,  for  all  i=l,...,m. 

l 

PROOF.  By  Remark  2  above,  we  can  shift  any  solution  so  that  there  is  at 
least  one  i  such  that  z^  =  0  and  that  there  is  no  other  index,  k  such  that 
zfc  <  0.  From  Lemma  1,  we  know  that  a  global  solution  to  this  problem  has 
integral  values.  Assume  that  an  optimal  solution  has  been  found  and  that  there 
exists  at  least  one  h  such  that  z  *  0  or  1.'  Therefore,  z  >  2. 

h  h  - 

By  Remark  3  we  know  there  are  r+1  model  locations  n  the  real  line  with 


k  at  0,  k  at  1,  etc..,  with  k  +  k  +. .  .  +  k  =  m.  If  r  >  2  we  wl 

0  1  0  1  r  ~ 


1 1  show 


that  it  is  possible  to  move  all  the  k^  model  positions  from  location  r  to 
r-2  without  increasing  the  objective  function  (1).  Let  z^  be  located  at 
integer  r  and  z ^  be  located  at  integer  k  <  r.  There  are  two  cases:  k=r-l 
and  k  <  r-2. 


For  the  first  case,  k=r-l  so  that  when  z  is  located  at  r 

J 


1 z  -z  I  -d 

_ 

lr-r+1 j-d 

1  J  i1  U 

1  1  U 

=  I 1-d  [ 
1  1J1 


and  when  z  ^  is  located  at  r-2  we  have 


N 

1 

N 

I 

a 

_ 

1 r-2  -  r+1 1 -d 

r  j  i'  i j i 

1  1  U 

=  1-d. 


U1 


which  is  the  same. 


For  the  second  case,  r  -  k  >  2,  so  that  when  z^  is  located  at  r  we 


have 


i  i 

1 Z  -z  I -c 

j 

jr-kj-a 

1  J  i '  U 

1  1  U 

and  when  z^  is  located  at  r-2 


|z  -z  1 -d 

— 

1 r-2  -  k 1 -d 

> 

1 2-2 1 -d 

1  j  i1  ij 

1  1  il 

1  1  lj 

=  d  . 
1J 
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Because  is  a  binary  variable 

I 2-d  I  >  d 

1  ij1  -  u 

which  shows  that  moving  z^  from  r  to  r-2  does  not  increase  the  objective 
function  in  this  case  either. 

All  the  other  terms  in  (1)  which  do  not  involve  the  model  location  r,  stay 
the  same  so  the  objective  function  is  not  changed  by  moving  z^  from  r  to 
r-2. 

By  repeatedly  applying  this  argument  we  can  find  an  optimal  solution  which 
uses  only  the  model  locations  0  and  1,  completing  the  proof. 


4.  Graphical  Interpretation 

The  binary  USPAM  problem  can  be  interpreted  as  a  problem  on  the  graph 
G(N,E),  whose  node  set  is  N  =  {l,...,m>  and  whose  edge  set  is  E  =  {(i,j)|i*j 
and  d  **  1}.  A  feasible  solution  to  the  binary  USPAM  problem  is  a  partition  of 
N  into  two  disjoint  subsets  Sq  and  with  Sq  containing  the  indices  i  such 
that  z  =0  and  S  contains  those  with  z  =1.  In  order  to  construct  a 

l  l  l 

partition  that  minimizes  (1)  try  to  use  the  following  rules: 

(a)  if  d^=l  for  two  nodes  i  and  j,  place  i  in  one  set  and  j  in 
in  the  other 


(b)  If  d^=0,  place  i  and  j  in  the  same  set. 

An  optimal  partition  is  one  that  violates  these  two  rules  a  minimum  number  of 
times,  since  violating  either  rule  causes  a  penalty  of  1  in  the  objective 
function  (1). 


It  is  easy  to  see  that  for  some  problems,  this  procedure  cannot  satisfy 
both  of  these  rules  for  all  pairs  of  indices.  Consider  the  following  simple 
example:  Let  d  -  1  for  i=l,2,3  for  j  >  i.  Graph  G  is  a  triangle  because 
by  rule  (b)  edges  (1,2),  (1,3)  and  (2,3)  are  in  E.  The  objective  function  for 
this  Droblem  is 
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w  = 

1-1 Z  -2  I 

+ 

1- 1 z  -z  I 

+ 

1- 1 z  -z  1 

1  1  21 

1  1  3' 

1  2  3 1 

The  minimum  value  of  W  is  1  and  can  be  obtained  (for  instance)  by  choosing 
zi=0,  z2=l,  and  z3=  1.  It  is  impossible  to  satisfy  both  rules  (a)  and  (b)  for 
this  example. 

Let  E  =  {(i,j)|i*j  and  (i,j)  £  E>  be  the  complement  of  E;  then  it 
follows  that 

| E |  +  ] E|  =  m(m-l)/2 

where  |X|  is  the  number  of  elements  in  set  X.' 

Given  a  partition  of  N  into  two  subsets  Sq  and  define  the  following: 

(a)  (i,j)  is  external  if  ieS  and  jeS  or  jeS  and  ieS  . 

0  10  1 

(b)  (i,j)  is  internal  if  i  and  j  belong  to  Sq  or  i  and  j  belong  to  S  . 
For  the  same  partition  of  N  we  define 

E  the  set  of  external  edges  of  E 

X 

Et  the  set  of  internal  edges  of  E 

E  the  set  of  external  edges  of  E 

/V 

Ej  the  set  of  internal  edges  of  E. 

From  these  definitions  it  follows  that 

|EJ  ♦  |E,|  *  |EJ  *  |E,|  =  m(m-l )/2.  (2) 

Using  these  definitions  it  is  obvious  that  the  binary  USPAM  problem  can  be 
restated  as  follows. 

LEMMA  3.  The  binary  USPAM  problem  is  to  choose  Sq  and  Si  so  as  to 
optimize  either  of  the  two  objective  functions 

(a)  Minimize  W  =  |E  |  +  |E  | 

(b)  Maximize  Z  =  |E  |  +  |Ej  . 

PROOF.  Statement  (a)  follows  from  the  graphical  interpretation  of  the 
problem.  The  equivalence  of  the  two  objective  functions  follows  by  rewriting 
(2)  as 
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|EJ  ♦  |EJ  =  m(lp-l)/2  -  |EJ  -  |EJ 
completing  the  proof. 

LEMMA  4.  An  optimal  solution  to  the  binary  USPAM  problem  gives  W=0  if 
and  only  if  G  is  a  complete  bipartite  graph  and  Sq  and  S  are  chosen 
accordingly. 

PROOF.  A  complete  bipartite  graph  G  =  (N  ,E  )  has  the  property  that  N 

can  be  partitioned  into  two  subsets  Nq  and  Ni  such  that  E  consists  exactly 

of  all  of  the  edges  (i,j)  with  ieNQ  and  jeN^  If  we  choose  S0=N0  and 

S  =N  ,  then  W=0.  Conversely  if  G  =  (N,E)  and  there  exist  subsets  S  and  S 
11  01 

such  that  W=0  then  I E  I  =0  and  |E  I  =0  so  that  E  =6  and  E  =<b  which 

1  x 1  1  l 1  X  \ 

implies  that  E=E^  and  E  =<p  so  that  G  is  a  complete  bipartite  graph. 

Remark  4.  From  the  objective  function  in  Lemma  3(a)  another  interpretation 
of  the  binary  USPAM  problem  can  be  stated  as  follows.  Given  a  graph  G,  choose 
a  partition  SQ,  Si  of  N  so  that  if  a  complete  bipartite  graph  is  constructed 
by  adding  the  edges  in  E^  and  deleting  those  in  E  ,  the  smallest  total  number 
of  changes  must  be  made. 

5.  The  Simple  Maximum  Cut  Problem 

Closely  related  to  the  binary  USPAM  problem  is  the  simple  maximum  cut 
problem  which  can  be  stated  using  the  same  notation  as  follows.  Given  a  graph 
G  =  (N,E)  find  a  partition  Sq  and  S^  of  N  so  that  the  number  of  external 
arcs,  E^,  connecting  the  two  sets  is  maximized. 

Although  the  maximum  cut  problem  is  known  to  be,  in  general,  NP  complete 
(Garey  and  Johnson,  1979),  there  is  a  considerable  literature  about  the  problem. 
Grotschel,  et  al. ,  1988,  provide  a  good  literaluie  leview  of  the  problem.  See 
also  Barahona,  1983;  Barahona,  et  al. ,  1985;  Barahona,  et  al. ,  1986;  and 

Fonlupt,  et  al.  ,  1984. 
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6.  Conversion  of  the  Simple  Maximum  Cut  Problem  into  a  Binary  USPAM  Problem 
We  now  show  that  an  simple  max  cut  problem  can  be  converted  into  a  binary  USPAM 
problem  in  a  polynomial  number  of  steps,  which  will  show  that  the  binary  USPAM 
problem  is  NP-hard. 

THEOREM.  The  uni-dimensional  similarities  problem  is,  in  general,  NP-hard. 

PROOF.  Consider  the  simple  maximum  cut  problem  defined  on  a  graph  G  -- 
(N,E).  We  show  it  is  a  special  type  of  binary  USPAM  problem  on  the  same  graph 
with  d^=w^=l  if  (i,j)cE  and  d  =w  =0  if  (i,j)cE.  The  objective  function 
for  the  binary  USPAM  problem  is  that  given  in  Lemma  3(b).  Using  the  weights 
defined  above  it  is 

Maximize  z=l|Ej  +  0|E  |  =  |E  |  (3) 

because  the  weights  are  zero  on  edges  (i.jleE^.  Since  the  objective  function  in 
(3)  is  exactly  that  of  the  simple  max  cut  problem,  the  proof  is  complete. 

There  are  certain  instances  where  the  USPAM  problem  can  be  solved  in 
polynomial  time.  For  example,  if  the  data  is  perfect  (i.e.  the  optimal 
objective  function  value,  W=0),  the  positions  of  any  pair  of  points,  i  and  j, 
at  a  distance  of  d(j  away  from  each  other  will  uniquely  determine  the 
positions  of  the  rest  of  the  points.  Also,  for  the  simple  max  cut  problem, 
Barahona  and  Mahjoub,  1986,  have  presented  a  polynomial  time  algorithm  for 
solving  it  on  any  graph,  G,  not  contractible  to  Kg.  This  algorithm,  called  the 
separation  algorithm,  uses  the  ellipsoid  method  to  solve  LP  problems.  Earlier, 
Grotschell  and  Pulleyblank,  1981,  presented  an  algorithm  for  weakly  bipartite 
graphs  which  include  graphs  not  contractible  to  and  having  nonnegative 
weights.  Also,  Orlova  and  Dorfman,  1972,  and  Hadlock,  1975,  have  used  matching 
techniques  and  planar  duality  to  solve  planar  max  cut  problems  in  polynomial 
time. 


7 


7. 


Conclusions 


We  have  shown  that  the  uni-dimensional  similarities  problem  is,  in  general, 
NP-hard  and  have  given  sever;1’  instances  for  which  a  polynomial  solution  method 
exists.  In  order  to  obtain  a  solution  for  a  specific  problem  having  general 
data,  we  could  develop  heuristic  procedures  or  convert  the  problem  into  a  0-1 
mixed  integer  program,  see  Ritchey,  1989.  Many  of  the  heuristics  developed  for 
the  traveling  salesman  problem  can  be  used  in  a  solution  process  for  this 
problem.  Regardless  of  the  chosen  method  of  solution,  at  each  iteration  of  a 
method,  an  Lj  estimation  problem  must  be  solved.  Therefore,  it  is  likely  that 
good  solutions  to  a  larger  problem,  (m  >  20),  will  be  computationally  expensive 
to  obtain. 
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